Add rule-based fragmentation #17

jackgisby · 2022-07-14T11:00:13Z

Add rule-based fragmentation
Restructure modules
Switch to conda-incubator for setting up Actions CI
Remove unused dependencies

…n of build

…ure db

…ures from MS/MS

…ate mc, exact_mass, MSn masses

…or the correlated query

codecov · 2022-07-14T11:06:45Z

Codecov Report

Merging #17 (a52ef36) into dev (88fd297) will decrease coverage by 0.24%.
The diff coverage is 94.01%.

@@            Coverage Diff             @@
##              dev      #17      +/-   ##
==========================================
- Coverage   94.86%   94.62%   -0.25%     
==========================================
  Files           7        8       +1     
  Lines         955     1190     +235     
==========================================
+ Hits          906     1126     +220     
- Misses         49       64      +15

Flag	Coverage Δ
unittests	`94.62% <94.01%> (-0.25%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
metaboblend/databases/results.py	`91.36% <91.36%> (ø)`
metaboblend/build_structures/build.py	`92.80% <92.80%> (ø)`
metaboblend/build_structures/annotate.py	`92.98% <92.98%> (ø)`
metaboblend/databases/connectivity.py	`96.03% <94.73%> (ø)`
metaboblend/databases/substructures.py	`95.53% <96.28%> (ø)`
metaboblend/__init__.py	`100.00% <100.00%> (ø)`
metaboblend/algorithms.py	`100.00% <100.00%> (ø)`
metaboblend/parse.py	`96.96% <100.00%> (+0.07%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 88fd297...a52ef36. Read the comment docs.

* Compatibility with conda version of geng; remove geng tool from package * Incorporate pkl files into connectivity database * Add nauty as dependency * Add pickle as test dependency * Switch from strings to pickles for connectivity graphs * Use blob instead of text to store pickled dictionary * No longer write substructures to .smi * Add option to build to select only frequent substructures * Add connectivity filter to k_configs * Incorporate connectivity filter into MSn build method * Build substructures for each set of masses independently * Call itertools.product on substructures within multiprocessing portion of build * Configure run script for current create_isomorphism_database inputs * Built subsets should be empty list, not None * Update variable names, remove debug options, update docstrings * Add annotate_msn and generate_structures user functions * Move stage at which multiprocessing step is performed * Allow for multiple output options in build * Remove ppm option for retrieving elemental composition from substructure db * Allow list of mc/exact_mass to be passed to generate_structures * Use TemporaryDirectory to store unittest results * Let generate_structures return/yield smiles * Implement build_msn to incorporate considerations for building structures from MS/MS * Implement annotate_msn to provide an interface to build_msn * Add/update build docstrings * Remove unnecessary build parameters * Pass data dictionary to user-facing build functions rather than separate mc, exact_mass, MSn masses * Update variable naming conventions * Add newline between smiles in out file * Update SubstructureDb for removal of .pkl files * Add function create_substructure_database * Bring tests up to date with variable renaming * Bring scripts up to date with variable renaming * Simplify loading of test data and remove teardown * Remove unused class ConnectivityDb and update SubstructureDb parameters * Implement additional non-msn build tests * Improve temporary table cleaning logic * Fix issues with new build functions * Allow tests to load auxiliary test data * Implement msn tests and update k_config test for new parameter * Correctly specify ppm in generate_structures * Minor docstring and code reformatting * Add binder dir * Add example notebook * Remove scripts * Implement basic notebook * Add small substructures to database prior to msn annotation * Complete notebook example * Fix logic for when smi_out_dir is None * Rename example_msms.ipynb to workflow.ipynb * Add pip to install metaboblend * Add data dir, remove databases dir, move test data to data dir * Write notebook databases to notebook_data * Unzip test data * Simplify test paths * Remove databases from gitignore * Use test databases for notebook * Implement simple hydrogenation rules * Get bond types rather than number of available atoms for hydrogen rule calculations * Don't count dummy atoms for bond type calculations * Remove dummy atom mass * Use max_degree of 6 and 2 available_atoms by default for create_substructure_database * Account for the fact we use neutral peaks (i.e. have removed adduct ion) * Modify hydrogen re-arrangement rules for doulbe bonds * Update databases tests * Implement test for calculate_possible_hydrogenations using reference numbers * Add test for calculate_hydrogen_rearrangements * Update hydrogen re-arrangement calculation function documentation * Update remaining unit tests * Add hydrogen re-arrangement compound HMDB XMLs * Record even substructures * Record even substructures in results DB * Add indexes to improve combine_ecs function performance * Improve results DB hierarchy and implement aggregation of scoring metrics * Define SQLite functions to calculate scores via queries alone * Record max BDE in spectra results table * Calculate frequency in the absence of scores (for non-MSn method) * Retain substructures does not cause substructures not to be initially recorded * Add additional scoring metrics * Update results db test data * Define ppm error and valence of fragment prior to re-ordering * Configure checks on recording of putative structure information * Calculate scores at substructure combination level * Convert True to 1 and False to 0 for conversion to SQLite boolean type * Index results DB * Use a loop in place of pool.map * Minor performance improvements * Merge minor performance improvements * Use the minimum absolute error for getting possible fragment ions * Add separate absolute error options for MSn peak and full structure * Use 0.005 for abs_error_precursor * Drop indexes before inserting into results DB * Add results table index on ms_id_num and structure_smiles * Update results DB tests * Add table for generating unique structure smiles IDs * Calculate cosine spectrum similarity * Allow for the specification of weights for the results database scoring calculations * Aggregate structure scores but force floating point division * Select fragment and substructure id when calculating results scores for the correlated query * Update results DB tests with updated scores * Don't create indexes until structure scoring * Don't include valence=0 substructures in the substructure database * Add max BDE parameter for building * Remove redundant connectivity graphs * Update data to test filter records function * Update dictionary pickle with Python 3.7 * Update file header * Update contact information * Update setup.py * Update tests for RDKit changes * Update README * Keep functioning buttons * Update testing workflow * Use python 3.7 * Remove unused dependencies * Use only the channel conda-forge * Add pillow and pyqt dependencies * Remove list definition in function arguments * Add algorithms test * Merge database tests into single file * Restructure modules * Restructure tests * Update outdated imports * Omit notebooks from coverage Co-authored-by: Ralf Weber <[email protected]>

jackgisby added 30 commits May 30, 2020 17:59

Compatibility with conda version of geng; remove geng tool from package

def547e

Incorporate pkl files into connectivity database

9988906

Add nauty as dependency

21025ff

Add pickle as test dependency

b79ecf1

Switch from strings to pickles for connectivity graphs

f180962

Use blob instead of text to store pickled dictionary

00c6f8f

No longer write substructures to .smi

bad61eb

Add option to build to select only frequent substructures

24ff1e3

Add connectivity filter to k_configs

8e425a5

Incorporate connectivity filter into MSn build method

4d3a57b

Build substructures for each set of masses independently

8cf9419

Call itertools.product on substructures within multiprocessing portio…

fdde9ba

…n of build

Configure run script for current create_isomorphism_database inputs

83c1885

Built subsets should be empty list, not None

48d1d17

Update variable names, remove debug options, update docstrings

7539896

Add annotate_msn and generate_structures user functions

92ffc9d

Move stage at which multiprocessing step is performed

fa9e6d7

Allow for multiple output options in build

0ab5349

Remove ppm option for retrieving elemental composition from substruct…

59989e5

…ure db

Allow list of mc/exact_mass to be passed to generate_structures

5d30eeb

Use TemporaryDirectory to store unittest results

602d7ba

Let generate_structures return/yield smiles

9024b19

Implement build_msn to incorporate considerations for building struct…

5123b77

…ures from MS/MS

Implement annotate_msn to provide an interface to build_msn

f581da5

Add/update build docstrings

dbd5ea9

Remove unnecessary build parameters

fdcc286

Pass data dictionary to user-facing build functions rather than separ…

e8ccd9d

…ate mc, exact_mass, MSn masses

Update variable naming conventions

e906068

Add newline between smiles in out file

80efc58

Update SubstructureDb for removal of .pkl files

7c72240

jackgisby added 27 commits March 25, 2021 13:51

Select fragment and substructure id when calculating results scores f…

15906f2

…or the correlated query

Update results DB tests with updated scores

017603a

Don't create indexes until structure scoring

f932cce

Don't include valence=0 substructures in the substructure database

504a573

Add max BDE parameter for building

760d039

Remove redundant connectivity graphs

4c57c0e

Update data to test filter records function

4a886a5

Update dictionary pickle with Python 3.7

135b6d7

Add notebook

85508f6

Update file header

2e9957f

Update contact information

fd8a8e1

Update setup.py

d62fa8d

Update tests for RDKit changes

54b60ad

Update README

197c5fa

Keep functioning buttons

010c0f1

Update testing workflow

022f0c4

Use python 3.7

c1d1bcd

Remove unused dependencies

a727907

Use only the channel conda-forge

a121159

Add pillow and pyqt dependencies

5145126

Remove list definition in function arguments

9718142

Add algorithms test

3b3e4df

Merge database tests into single file

c78a039

Restructure modules

6c1329a

Restructure tests

64ee9dc

Update outdated imports

2bcbf7d

Omit notebooks from coverage

a52ef36

jackgisby requested a review from RJMW August 8, 2022 13:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rule-based fragmentation #17

Add rule-based fragmentation #17

jackgisby commented Jul 14, 2022

codecov bot commented Jul 14, 2022 •

edited

Loading

Add rule-based fragmentation #17

Are you sure you want to change the base?

Add rule-based fragmentation #17

Conversation

jackgisby commented Jul 14, 2022

codecov bot commented Jul 14, 2022 • edited Loading

Codecov Report

codecov bot commented Jul 14, 2022 •

edited

Loading